ablation method
- Research Report > Experimental Study (0.92)
- Research Report > New Finding (0.67)
Optimal ablation for interpretability
Interpretability work in machine learning (ML) seeks to develop tools that make models more intelligible to humans in order to better monitor model behavior and predict failure modes. Early work in interpretability sought to identify relationships between model outputs and input features (Ribeiro et al., 2016; Covert et al., 2022), but with only black-box query access to observe inputs and outputs, it can be difficult to evaluate a model's internal logic.
- Research Report > Experimental Study (0.92)
- Research Report > New Finding (0.67)
Optimal ablation for interpretability
Interpretability studies often involve tracing the flow of information through machine learning models to identify specific model components that perform relevant computations for tasks of interest. Prior work quantifies the importance of a model component on a particular task by measuring the impact of performing ablation on that component, or simulating model inference with the component disabled. We propose a new method, optimal ablation (OA), and show that OA-based component importance has theoretical and empirical advantages over measuring importance via other ablation methods. We also show that OA-based component importance can benefit several downstream interpretability tasks, including circuit discovery, localization of factual recall, and latent prediction.
- North America > United States (0.14)
- Asia > Middle East > Jordan (0.04)
Investigating Neuron Ablation in Attention Heads: The Case for Peak Activation Centering
Pochinkov, Nicholas, Pasero, Ben, Shibayama, Skylar
The use of transformer-based models is growing rapidly throughout society. With this growth, it is important to understand how they work, and in particular, how the attention mechanisms represent concepts. Though there are many interpretability methods, many look at models through their neuronal activations, which are poorly understood. We describe different lenses through which to view neuron activations, and investigate the effectiveness in language models and vision transformers through various methods of neural ablation: zero ablation, mean ablation, activation resampling, and a novel approach we term 'peak ablation'. Through experimental analysis, we find that in different regimes and models, each method can offer the lowest degradation of model performance compared to other methods, with resampling usually causing the most significant performance deterioration. We make our code available at https://github.com/nickypro/investigating-ablation.
- Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (4 more...)
Decomposing and Editing Predictions by Modeling Model Computation
Shah, Harshay, Ilyas, Andrew, Madry, Aleksander
How does the internal computation of a machine learning model transform inputs into predictions? In this paper, we introduce a task called component modeling that aims to address this question. The goal of component modeling is to decompose an ML model's prediction in terms of its components -- simple functions (e.g., convolution filters, attention heads) that are the "building blocks" of model computation. We focus on a special case of this task, component attribution, where the goal is to estimate the counterfactual impact of individual components on a given prediction. We then present COAR, a scalable algorithm for estimating component attributions; we demonstrate its effectiveness across models, datasets, and modalities. Finally, we show that component attributions estimated with COAR directly enable model editing across five tasks, namely: fixing model errors, ``forgetting'' specific classes, boosting subpopulation robustness, localizing backdoor attacks, and improving robustness to typographic attacks. We provide code for COAR at https://github.com/MadryLab/modelcomponents .
- Government > Regional Government > North America Government > United States Government (0.67)
- Information Technology (0.48)
- Health & Medicine (0.46)
A Deep Knowledge Distillation framework for EEG assisted enhancement of single-lead ECG based sleep staging
Joshi, Vaibhav, Vijayarangan, Sricharan, SP, Preejith, Sivaprakasam, Mohanasankar
Automatic Sleep Staging study is presently done with the help of Electroencephalogram (EEG) signals. Recently, Deep Learning (DL) based approaches have enabled significant progress in this area, allowing for near-human accuracy in automated sleep staging. However, EEG based sleep staging requires an extensive as well as an expensive clinical setup. Moreover, the requirement of an expert for setup and the added inconvenience to the subject under study renders it unfavourable in a point of care context. Electrocardiogram (ECG), an unobtrusive alternative to EEG, is more suitable, but its performance, unsurprisingly, remains sub-par compared to EEG-based sleep staging. Naturally, it would be helpful to transfer knowledge from EEG to ECG, ultimately enhancing the model's performance on ECG based inputs. Knowledge Distillation (KD) is a renowned concept in DL that looks to transfer knowledge from a better but potentially more cumbersome teacher model to a compact student model. Building on this concept, we propose a cross-modal KD framework to improve ECG-based sleep staging performance with assistance from features learned through models trained on EEG. Additionally, we also conducted multiple experiments on the individual components of the proposed model to get better insight into the distillation approach. Data of 200 subjects from the Montreal Archive of Sleep Studies (MASS) was utilized for our study. The proposed model showed a 14.3\% and 13.4\% increase in weighted-F1-score in 4-class and 3-class sleep staging, respectively. This demonstrates the viability of KD for performance improvement of single-channel ECG based sleep staging in 4-class(W-L-D-R) and 3-class(W-N-R) classification.